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Abstract 

Bundle methods have been intensively studied for solving both convex and nonconvex optimization 
problems. In most of the bundle methods developed thus far, at least one quadratic programming (QP) 
subproblem needs to be solved in each iteration. In this paper, we exploit the feasibility of developing a 
bundle algorithm that only solves linear subproblems. We start from minimization of a convex function and 
show that the sequence of major iterations converge to a minimizer. For nonconvex functions we consider 
functions that are locally Lipschitz continuous and prox-regular on a bounded level set, and minimize the 
cutting-plane model over a trust region with infinity norm. The para-convexity of such functions allows us to 
use the locally convexified model and its convexity properties. Under some conditions and assumptions, we 
study the convergence of the proposed algorithm through the outer semicontinuity of the proximal mapping. 
Encouraging results of preliminary numerical experiments on standard test sets are provided. 

Keywords Nonconvex optimization. Nonsmooth optimization. Trust region method, Linear subproblem, Prox- 
regular 


1 Introduction 

As a generalization of nonlinear programming (NLP), nonsmooth optimization (NSO) has broader application 
areas and more theoretical challenges. Traced back to as early as 1962 [Mj, NSO has been studied intensively 
hitherto with more and more new methods being developed. Generally speaking, methods in unconstrained NSO 
fall into the following categories: subgradient methods ESHHE], gradient sampling methods HHS], modified 
Newton ED] or quasi-Newton m methods, proximal point methods ED, derivative free methods EZ], and bundle 
methods [H mini [23]. NSO has a much broader scope than NLP and theory can be developed for very generic 
functions classes ie. Lipschitz continuous etc. So in order to develop more efficient algorithms and stronger 
theoretic results, various special structures of the objective function are often assumed. Special algorithms 
have developed, tailored for well-structured problems such as convex composite m, partially separable m, 
etc. Bundle method is probably the most intensively researched methods for nonsmooth optimization. It was 
created independently by Claude Lemarechal [15] and Philip Wolfe ES] in 1975. Since then a great number of 
variants of bundle methods have been developed, such as proximal bundle cai, trust region bundle E3I1]> 
splitting bundle [7], and redistributed bundle |S]. Bundle methods grew out of cutting plane methods which 
often showed a great deal of instability (see page 276 of [D]). To correct this a better approximation of the whole 
subdifferential is made and stabilization of the descent step is also incorporated. 

Consider any optimization problems of the form 

minimize fix) (1) 

where the objective function / : R" —>■ R is locally Lipschitz continuous. The traditional bundle method 
solves a parametric quadratic subproblem in each iteration to obtain a search direction with possible options to 
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follow with a line search. The advantage of employing a quadratic subproblem is that the optimal solution is 
unique due to the strict convexity, furthermore, the solution can be explicitly expressed in terms of the current 
iteration point and an average of some subgradients. However this convenience must be weighed up against the 
inconvenience of having to solve a QP at each iteration which can be time-consuming, especially for large scale 
problems. One of the motivations of this paper is to overcome the necessity of solving a QP in each iteration. 
Hence, we propose a new version of bundle algorithm for solving (CD with linear subproblems only. Surprisingly, 
one finds that even without the explicit expression of the solution, and even without line search, our algorithm 
can still converge to stationary points for nonconvex problems. 

Our approach uses two key tools to achieve this end. First, is a subdifferential approximation. Traditional 
bundle methods use a convex combination of subgradients (an average) to approximate a selection from an 
approximation of the whole subdifferential set (the subgradient selection technique). Traditional subgradient 
methods, including gradient sampling methods, use minimum norm subgradient as a replacement for the gradi¬ 
ent, a calculation involving the solution of a QP. In our method we use a trust region method based on a linear 
cutting plane model (of a related local convexification of /) and in the theoretical analysis we use —, 

where / is the minimum value of / and P{x) is the projection of x onto the optimal solution set, as a lower 
bound on the norm of the minimum norm element of the subdifferential. This alternative for the subdifferential 
approximation plays a significant role in our convergence analysis. Secondly, we use a local convexification 
technique developed in [5]. For nonconvex functions that are prox-regular and Lipschitz continuous, we show 
that there exists a number ‘a’ such that /(•) -I- f || • —xp is a restriction, to a level set of /, of a globally convex 
function. Clearly when / is convex it suffices to take a = 0. Unlike [5], where the convexification parameter ‘a’ 
is eventually stabilized, we only need ‘a’ to be bounded. This allows us to exploit the outer semicontinuity of 
the subdifferential and that of the associated proximal mapping, while traditional bundle methods use the outer 
semicontinuity of the e-subdifferential. 

The use of a linear model in a trust region method is a relatively new idea and our work follows that of 
m where a related approach is used to solve a large scale optimization problem that arise out of stochastic 
programming. Standard theory for trust region methods use quadratic model functions. This is usually justified 
via the imposition of sufficient differentiability assumption on the objective function. Such logic is less compelling 
in the context of nonsmooth optimization. Furthermore, one usually associates a bundle-trust-region method 
to an approach that absorbs the trust region into a quadratic penalty to control the step length. We instead 
directly impose an infinity box norm that is handled in more or less a tradition manner for a trust region 
method. Unlike m we are able to handle a generic class of prox-regular, locally Lipschitz function, greatly 
extending the applicability of this approach. 

This paper is organized as follows. Section 2 contains the properties of our objective function as preliminary 
knowledge and Section 3 includes a description of a version of bundle method with linear subproblem for convex 
optimization. In Section 4 we derive the method for nonconvex optimization and we analyse the convergence 
of the algorithm in Section 5. Preliminary numerical tests are presented in Section 6. 

In this paper, H-H, H-Hi and IHloo denote the two norm, one norm and infinity norm, respectively. Denote by 
levbf the lower level set of / defined by {a; G ]RP\f{x) < b} and B{x, e) the closed ball centered at x with radius 
e. The convex hull of a set C G R" is denoted by co U, the domain of function / is dom / and the interior of a 
set C is int C. 


2 Properties of the objective function 

For the reader’s convenience we collect in this section some standard definitions and properties we will be 
utilizing in our development. 

Definition 2.1 (subdifferential). Let / : R" —>■ R be finite at x. 


1. The set 


is called the Frechet subdifferential or regular subdifferential of / at a; with elements called Frechet sub¬ 
gradients or regular subgradients of / at x. When |/(a;)| = oo then df{x) := 0. 
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2. The set df{x) := limsup df{x) is called the basic subdifferential of / at a; with elements called basic 

x—*x 

/(rr)^/(x) 

subgradients of / at x. When |/(a;)| = oo then df{x) '■= 0. 

Definition 2.2 (prox-regularity, Definition 13.27, [H]). A function / : R" — >■ R is prox-regular at x for v with 
respect to e and a if / is finite and locally lower semicontinuous (l.s.c) at x with v € df{x), and there exist 
e > 0 and a > 0 such that 


f{x') > f{x) + {v,x' — x) — ^||x' — a;|p V a:' S B{x,e) (2) 

when ||a; — xll < e, u G df{x), ||u — uH < e, f{x) < f{x) + e. When this holds for all v G df{x), f is said to be 

prox-regular at x. 

Definition 2.3 (para-convexity). Given a point x G R" and a real number e > 0, a function / : R" —>■ R is 
para-convex on B(x, e) with respect to a if there exists a > 0 such that the function /(•) -|- ^|| • |p is convex on 
B{x,e). 

Definition 2.4 (proximal mapping). For a proper, l.s.c. function / : R" i—>■ K and a parameter a > 0, the 
Moreau envelope function Caf and proximal mapping (or proximal point mapping) Paf are defined by 

eaf{x):= ini {f{w) + ^\\w-x\\'^}, (3) 

w^domj Z 

Paf {x):=arg min {f{w) + ^\\w - xW^}. (4) 

w^aomj Z 

ff the proper l.s.c function / is bounded from below then Paf(x) is nonempty and compact for all (x, a) G 
R" X K>o, and the mapping K" x K>o 3 (x, a) i—>■ eaf{x) is continuous. From Definition 12.41 we see that 

if p G Paf{x) then f{p) < eaf{x) and a{p — a;) G df{p). We also have eaf{x) < f{x) for all x G R" and 

Cafix) = fix) if and only if x G Pq/(x). 

Definition 2.5 (outer semicontinuity). A set valued mapping S : R" =4 R™ is outer semicontinuous at x if 

{u I 3 x-^ —>■ X, 3 —>■ u with G S'(x'’ )} = S{x). 

We note that both subdifferential and proximal mapping are outer semicontinuous. Additionally, the map¬ 
ping R" X R>o 3 (x, a) 1-3 Paf{x) is outer semicontinuous. 

The following proposition is from lemma 2.2 of 

Proposition 1. If a function f : R" —^ R is locally Lipschitz continuous and prox-regular at x then there exist 
e and ‘a’ such that f is para-convex on B{x,e) with respect to ‘a’. 

Remark, ff / : R” —>■ R is a convex function then it is useful to note that 

/(x) = sup {/(y) -f s^(x - y)\y G R”, s G df{y)) . (5) 

3 LP-bundle method : The convex case 

Consider minimizing a convex function / on R”. Denote S the set of minimizers of /. Then S is closed and 
convex. Assuming S is nonempty, the projection operator P(-) onto S is well defined. In reference |17) . a bundle 
trust-region method was proposed to solve a two-stage stochastic linear programming problem. We show that 
this method can be generalized to minimize any convex and locally Lipschitz continuous functions. We refer to 
the generalized method as bundle method with linear programming for convex optimization (LPBC). Given an 
auxiliary point yi and a subgradient Si G df{yi), a cutting-plane function is a linear mapping 

X 1-^ fiVi) + {si,x - Pi). ( 6 ) 

The cutting-plane model of /(x) is constructed by the point-wise maximum of the cutting-plane functions as 
follows: 

m{x) = max{f{yi) -\- {si,x - Pi)}, (7) 

i£l 
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where I is the index set of auxiliary points. LPBC applies the model function on a trust region generated by 
infinity norm so that it solves the following subproblem sequentially, 


min Tn(x) 

aigR" 

subject to ||x — x||oo < A, 


( 8 ) 


where x is the current best candidate for a minimizer of / and A is the trust region radius. Adopting a scalar 
variable, problem (|S]) is equivalent to the following linear programming problem 

min z (9a) 

(a;,2)GIR’*+i 

subject to /(j/j) + {si,x- Vi) < z, V z S I, (9b) 

II 2 ; — x||oo < A. (9c) 


During the fcth iteration, LPBC solves several linear problems with different model functions ‘m’ and possibly 
different trust region radii A before a new iterate x^~^^ is identified. Hence LPBC refers to and as major 
iterates and x^\ Z = 0,1, 2, • • • obtained by solving the current linear problem as minor iterates. We will also 
use X* to denote minor iterates when it is not necessary to identify the iteration indices. The subscript kl and 
sometimes {k,l) means I minor iterations have been executed after fc-th major iteration. Consequently, the x, 
A and I in are replaced by x^, Aj' and I{k, 1). After solving subproblem ([HI), an optimal solution (x^*, is 
obtained. And x^^ will be accepted as new iterate if it yields substantial reduction in the real objective /, 
otherwise the model function will be refined by adding and deleting cutting planes. The substantial reduction 
in the value of / is measured by its quotient with the reduction of model value, i.e. mf(x^) — mf{x^^). LPBC 
updates the model m in a way such that the following conditions hold: 

mf(x^) = /(x^), for all index k, 1. (10) 

mf is a convex, piecewise linear lower underestimate of /, Z = 1, 2, • • • . (11) 

Specifically, to obtain LPBC flushes all the cutting planes except the following two types. 

• The cutting plane is generated at x^. Thus the cutting plane /(x^) + s^^(x — x^) is always kept in the 
linear subproblem during the fcth major iteration; 

• The cutting plane is active at x^* with positive Lagrange multiplier. 

LPBC adds the new cutting plane generated at x^^ /(x^*) + sf^(x — x^*) to the model 


Procedure 1: LPBC Updating Trust Region 

Define 

/(x'^) - /(x'^O 

( 12 ) 

/(x^) — 

if Pi > 773 and |x^* — x^ |oo > 0.9A then 

1 A ^ min{a2A, Amax} 
else if p? <-. }, then 

1 A ^ oiA; 
end 




In the definition of pf in (TT^ . the denominator /(x^) — mf (x**) is the reduction of model m(x) from x^ 
to x^* due to condition m- We will use the notion linearization error, the difference between the value of a 
function and the value of a cutting-plane function. Consider a cutting plane of a generic function, as defined in 
(|6|). The linearization error of this cutting plane at x is 

ei'-fix)-[f{yi) + {si,x-yi)]. (13) 
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In the LPBC algorithm, the linear subproblem (l9|) was used. Applying the KKT condition to ([9]), we can deduce 
the explicit expression of the model reduction of LPBC as following, 


m{x) — m{x*) 


i&I 


XiCi + A|| AiSiiii, 

ieJ iG/ 


if llx* - a;||oo < A; 
if ||x* - S||oo = A. 


(14) 


where I is the index set for active constraints in (I9bl) associated with an optimal solution {x* ,z*), and \i for i G I 
are the corresponding Lagrangian multipliers associated with {x*,z*). Since / is convex, all the linearization 
errors will be nonnegative. We also have 


AiSj e def{x), where e = A^Ci. 

iG/ iG/ 


(15) 


The derivation of (d and m can be found in Lemma [5] where we prove the same conclusion for the gen¬ 
eralization of LPBC, the LPBNC algorithm, with derivation following the same reasoning. The mapping 
{x, e) >—>■ def{x) is outer-semicontinuous, and hence when the model reduction decreases to 0 we have 0 G df{x). 
Consequently, our stopping criterion is that the model reduction is sufficiently small as it is showed in line [3] 
of Algorithm [21 It is worthy to note that our model reduction in d is comparable with that in the classical 
bundle method which uses a quadratic model of the form 


min mix)-\-^\\x — xW"^ 

a;GIR'* 1 2 


(16) 


If the above model is used, then mix) — mix*) = XiCt + ^||X) AiSip. A significant difference between (|5]) 

iG/ iG/ 

and (fT31) is that the latter is strictly convex but the former is not. The optimal solution to dT51) is unique and 
can be expressed by a;* = x — while (jS]) may have multiple solutions. The readers are referred to 

i&I 

chapter XIV and XV of [9] for a comprehensive understanding of classical bundle methods. Algorithm LPBNC 


Algorithm 2: Algorithm LPBC 

Data: Final accuracy tolerance etoi, and maximum trust region radius Amax, initial trust region 

A( G [1, Aniax), initial point x^, trust region parameters rji, r] 2 , rj^, integer T >20 (inactive 
threshold); 

1 Initialization Set the major and minor iteration counter ik,l) G- (0,0); 

2 generate a cutting plane at x^, update the index set /(fc, 1) and define the cutting plane model ([7]). Set 
the minor iteration counter I = 0, yi = x^ and compute G 5/(?/i); 

3 Solve the linear programming subproblem (|21) and obtain an optimal solution (x^*, z^^); 

4 if fix'^) - m^ix^’-) < (1 -I- |/(a;'')|)etoi then 

5 I STOP 

6 end 


7 

8 
9 

10 

11 

12 

13 

14 

15 


if 


> rji then 


^.fe+l _ j,kl. 

obtain Aq^^ via procedure (U 

obtain by keeping all cutting planes except those 

k = k + 1, continue to next major iteration by going to 


that have been inactive for T iterations; 
line 12] 


else 

obtain Aj^^ via procedure [T] 

delete all cutting planes except the one generated at x'^, those that are active at and those that 
have been inactive for less than T times. 

end 


16 add the cutting plane fix^^) + sf^(a: — x^^) to the model 

17 set I = I + 1 and go to line (2) 


is a generalization of LPBC and we will provide a convergence proof for this in section |S| Hence we omit the 
proof here and only state the following convergence theorem for the convex case. 
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Theorem 1. Suppose that etoi = 0. 

(i) If Algorithm\^ terminates at then is a minimizer of f with x^ = P{x^); 

(ii) if there is an infinite number of minor iterations during the kth major iteration, then x^ is a minimizer of 
f with x^ = P{x^) and lim m^{x^^) — f{x^) = 0; 

l—¥CO 

(Hi) if the sequence of major iterations {x^} is infinite then lim \\x^ — P(a;*)||oo = 0. 

k—^oo 

4 LP-bundle method : The nonconvex case 

4.1 Derivation of the Method 

Based on the LPBC, we can derive a nonconvex version of the method through convexification for special 
types of functions. Specifically, we consider locally Lipschitz continuous functions that are prox-regular. Such 
functions are para-convex. Hence we can use a linear model to approximate a locally convex function. Under 
some assumptions, we show that the accumulation point of the minimizers of such functions is a stationary 
point of the objective function via the theory of proximal point mapping. 

First we state the assumption on the objective function. 

Assumption 1. The objective function f is locally Lipschitz continuous and bounded below. Given x^ € R” / 
is prox-regular on bounded level set lev^of. 

Note that a single-valued function / is prox-regular on an open set O and locally Lipschitz continuous is 
equivalent to that / is lower-C^ on O; see [22l Prop. 13.33]. Define 

9{y) ■= 9 {y;x,a) y ^ f{y) + ^||y-a:||^ (17) 

with X and a as parameters. Under Assumption [U the Moreau envelope function and the proximal point 
mapping associated with the objective / are globally well defined. We redefine them here as 

ea{x) ■■= xa:m.{g{y,x,a)} Pa{x) ■■= arg uAn{g{y,x,a)}. (18) 

yeR" yGR" 

If / is para-convex, then according to Definition 12.31 g{y) is convex on some neighborhood Bb{x). Clearly, for 
different x and b, in order to make g{y) convex with respect to y, there exists a threshold for the value of ‘a’. 
The motivation of our method is based on the following observation. Suppose we have some sequences x^ ^ x', 

—>■ a', and b^ —?> b' such that g{y,x^,a^) is convex with respect to y on Bbk{x^) for all k. Then we can 
use a cutting-plane model for g[y,x^,a^) with trust region as in LPBC to obtain descent locally. To justify its 
stationarity, x' should be a global minimizer of g{y;x',a'). In fact, the outer semicontinuity of the mapping 
{x, a) Pa{x) means if there exist —>■ x, {a^} bounded andp^ G Pak{x'^) with ||a;^ —p^\\ —>■ 0, then x G Pa{x) 
for some a. Thus in order to find a stationary point our goal can be translated to generating a sequence {a;"} 
and {p”} such that lim ||a;” — p"||oo = 0 with p” G Pa" (a:") and {a”} bounded. Further discussion will be 

n—>-oo 

made in Section [5] 

The following lemma shows that there exists a threshold for the value a such that the function g{y) is locally 
convex on lev„,of (which is not necessarily convex). 

Lemma 1. Under assumption[I\ there exists a number a > 0 such that for any a > d, the function g{y, x, a) is 
convex on a neighborhood of y for all y G lev^-of. 

Proof. According to Assumption [U / is locally Lipschitz continuous and given G R", / is prox-regular 
at each point in the compact set leVa,o/. For all x G lev 3 ,o/, there exist e{x) and a(x) such that / is 
prox-regular at x with respect to e(x) and a(x). By Proposition [TJ / is para-convex on B{x,e{x)) with re¬ 
spect to a{x) for all x G leVa,o/; i.e. the function g{y,a{x)) := f{y) + is convex on B{x,e{x)) for 

each X G lev 3 ;o/. We see {int B{x,e{x))\x G levo-o/} is an open cover of levo-o/ and it has a finite subcover 
{int B{xi, €{xi))\i = 1, • • • , m} corresponding to some a{xi), i = 1, ■ ■ ■ , m. Define d := max {a(xi)|z = 1, • • • , m}. 
Then the function 

g{y,d) = g{y;a{xi)) -I- for all i = I,-- - ,to is also convex on each B{xi,e{xi)). Consequently, 

g{y, x,d) = p(p; a) — a (p, a:) + is convex on each B(xi, e(xi)) and so is g(y; x, a) = g{y; x, d) + ^^\\y — x\\^ 

for all a > d. □ 
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The following theorem shows that there exists a threshold for the value a such that the function g{y) is the 
restriction to lev^-o/ of a convex function. See the Appendix for the proof of this theorem. 

Theorem 2. Suppose f is prox-regular and locally Lipschitz on a bounded level set lev^of with int lev^of ^ 0. 
Let g {y; x, a) be defined in (fT71) with o > 0. There exists a number > 0 such that for all a > , g{y; x, a) is 
the restriction to lev^of of a globally convex function H{y;x,a) satisfying g{y\x,a) > H{y]x,a) for all y € R" 
and X G lev^of- 

Theorem [2] essentially shows that g can be described as a restriction of a convex function. We will in the 
future refer to this as the ’restriction property’. 


4.2 On-The-Fly Convexification 

The threshold value is hard to find. Our goal in this section (see also Section lT4ll is to find a lower bound 
for the parameter a such that g{y]x,a) is a restriction of a convex function locally within lev^jo/. We hrst 
introduce the convexification technique which first appeared in [5]. Suppose we are at the current iteration 
point, i.e. the current best candidate for a stationary point of /. We denote this point x in general and by 
a;^ when it is necessary to indicate it is in the k-th iteration. A necessary condition for this is that all the 
cutting planes generated at the points in the subset should be below the graph of g, since a convex function is 
essentially represented by the point-wise supremum of cutting-plane functions. 

Denote dg{y, x, a) as the subdifferential of function g with respect to variable y. It follows from the calculus 
of subdifferential and Assumption [1] that 

dg{y; X, a) = df{y) + a{y - x) and dg{x\ x, a) = df{x) (19) 

for all y and x in the set lev 3 ;o/. For any s G df{y) and y G lev^-o/, clearly we have s -I- a[y — x) € dg{y] x, a). 
Consequently, the cutting-plane function of g at the point y^ can be written as 

h{w,yi) := h{w,x,a,yi) : w >-)■ f{y,) + |||y* - a;|p -h (si -h a{yi -x),w- yi) , (20) 

where Si G df{yi). According to Theorem [2j under Assumption [TJ if a > y is a restriction to lev^-o/ of a 
convex function H minorizing g. Thus a cutting plane of g generated at an auxiliary point yi G int lev^-o/ is 
the same cutting plane of H generated at yp, additionally, as i? is a convex function minorizing y, this cutting 
plane is not only below the graph of H but also below that of y. In summary we have 

fl(y; X, a) - h{y; x, a, yi)>0, V y G K”, x G lev^-o/, y^ G int lev^-o/, a > a^^. (21) 


We provide a localized convexification process by selecting a collection of points around the current iteration 
point and verifying the necessary condition for convexity. We let ‘a’ be variable and I be some index set, and 
set 

g{yf,x,a) - h{yj;x,a,y,) >0, V j, j G/; (22) 

to deduce the necessary condition for a: a > d™™, where 


:= max{ 
* je/ 




(23) 


This value can be negative, so we set a™'" := max{d™'", 0}. Consequently, for any a > a™™, (1^^ holds true. 

Note that a™'" is dependent on the points indexed in I. Consequently, each time a new auxiliary point yi 
is obtained needs to be updated. We also note that, a™™, the local lower bound for the convexification 
parameter, determined by ( 1 ^ where x G lev^of, yi G int lev^of, V z G /, is not greater than that satisfies 

(ED)- 


4.3 The Model Problem and Model Reduction 

The cutting-planes model of g{y,x,a) is dehned by 

m(w) := m{w; x,a,I) : w i—>■ max{h(w; x, a, yi)}, (24) 
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where / is the index set of auxiliary points yi where cutting planes of function g are generated. Our algorithmic 
model is defined by (0^ and in the remainder of this paper, m refers to the model defined in (IMl) unless 
otherwise stated. Suppose we are at x. To proceed in finding a new candidate, we intend to obtain descent in 
/ by minimizing the cutting-planes model of g{x', x, a) over a trust region, i.e. we solve the linear subproblem 


min m(x]x,a,I) subject to lla; — a;||oo < A, (25) 

which is equivalent to the following problem 

min z (26a) 

(x,z)GIR"+i 

subject to f{yi) + ^\\yi-x\\'^ + {si + a{yi-x),x-yi)<z,iel, (26b) 

||a; - i||oo < A. (26c) 


We would like to inspect the reduction of the model function m after we have obtained a new trial point via 
the linear programming problem (1^ . Denote a general optimal solution of problem (1^ by {x*,z*) and by 
(^.fc/^^fei) ji; jg necessary to indicate it is in the ^-th minor iteration in the fc-th major iteration. In the 

following lemma we derive the explicit expression of the reduction of the model from x to x*. We will use the 
linearization errors of g{-\x,a) at x: 


Ei ■■=g{x; X, a) - h{x; x, a, yi) 

=f{x)- f{yi) + ^\\yi-x\\'^ + {si + a{yi-x),x-yi) ,yiel. 


(27) 

(28) 


Lemma 2. Consider the linear problem (EHH- Let i € I be such that x = yi, m{x) := m(x] x, a, I) be defined as 
in ((M)l . (x*,z*) be an optimal solution of (1^^ and suppose ‘a’ is such that Ei>0, i € I. 

(i) The following holds true 


m{x) — m(x*) = f{x) — z* 


Y^XiEi, i/||x* - x||oo < A; 

iei 

+ a{yi - s)]||i, if ||x* - x||oo = A, 

zG/ iG/ 


(29) 


where I is the index set for active constraints in (I26b|) at {x*,z*), and Xt for i G I are the corresponding 
Lagrangian multipliers of (I26bl) . 

(ii) Let C be any set satisfying x G int C and yi G int C for all i G I, if additionally ‘a’ is such that g{y;x,a) 
is a restriction to C of a globally convex function H{y;x,a) satisfying g{y;x,a) > H{y;x,a) for all y G R", 
X G R". Then 

'^Xfisi + a{y^ - x)] G dig{x;x,a), where e ='^ X^Ei; (30) 

zG/ zG/ 

and furthermore, if 0 G dog{x;x,a), then x is a global minimizer of g{y,x,a). 

Proof, (i) By definition m{x) = max{/i(a;; x, a, y^)} > h{x]x,a,yi) and since x = y, we have h{x-,x,a,yi) = 

iei _ 

fiVi) = /(^) by the definition of function h in (1^ . As ‘a’ is such that Ei > 0 for all i G L, we get 

max{/i(a;; x, a, yt)} < g{x; x, a) = fix) via the definition of Ei in (l?7ll . Consequently, mix) = fix). Problem (1^^ 
iei 

is equivalent to (HSl) in the sense that the optimal solution (x*, z*) satisfies mix*) = z* = min{m(a:)| ||a; —a:||oo ^ 
A}. Therefore mix) — mix*) = fix) — z*. The / in (1^ is defined by 

/ := G l\fiyz) + + {s^ + aiyi - x),x* - y^) = . (31) 

I cannot be empty because ix*,z*) has to be on some cutting plane. If ||a;* — a;||oo = A, then one of the sets 

Ir ■= {i & - ■ ,n}\x* = Si -I- A} , and Ll := {f G {1, • • • ,n}\x* = Si - A} (32) 

will be nonempty. As (x*, z*) is the optimal solution of linear problem (l26ll . it satisfies the Karush-Kuhn-Tucker 
(KKT) conditions; that is, there exist multipliers Ai > 0, i G L, Ui > 0, i G Ir, and Wi > 0, i G Lr such that 


Ai[si -I- aiyi - a:)] + y] uyi - w^Ci = 0, 

iei ieiR ieih 
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(33) 








(34) 


1 - ^ A, = 0, 


iei 

where e^, i = 1, - ■ ■ ,n are vectors in R" with the i-th component being 1 and the others 0. If ||x* — a;||oo < A, 
then (1551) simply reduces to 

^ Ai[si + a(?/i - x)] = 0. (35) 

iGl 

We have 


f{x) - Z* = ^ A* [f{x) - z*] (by (IMl)) 

= ^ A, |/(x) - f{yi) + ^\\yi - + (si + a(y, - x), x* - y,)j | ( by dST])) 

iGl 

■■ + “(2/* -x),x- yjj I 

- ^ Ai (si + a(yi - x),x* - x). 


i£l 


i£l 


Consider the first case when ||x* — x||oo < A. Then by (Ell), EE) and EE) we have 

/(x)-z*=^A,i?,. 


IGI 


If ||x* — x||oo = A, then by (P5|) . ([55|l and (|55|) we have 

f{x) - - 2 :* = ^ X^Ei + ( ^ ^ 


Wjej.x —X 


i^I 




i^lL 


= ^ XiEi + A [ ^ Ui + ^ Wi ] (by definition of Ir and II)- 


i£l 


Ki^Ir i^Il 


On the other hand, 


liy] A^Sj + a{yi - x)]||i = || met - w^ei\\l (by (EE) 
ieJ i&lR i&lL 

= I Ui + Wi ] (by definition of Ir and Ir). 

\i&lR ielL ) 

Combining (157)) and (|551l we get 

f{x) - z* = y] AjSi + A||y Xi[si + a{yi - x)]||i. 




i€l 


(36) 


(37) 


(38) 


(ii) By the restriction property in Theorem[2] we have g{y; x,a) = H (y; x, a) for all y € C, and dg{y, x, a) = 
dH (y; x, a) for all y G int C. As H is globally convex, deg{y; x, a) can be defined by d^giy; x, a) := d^H (y; a;, a) 
for any y G int C. For all i G /, since y^ G int C, we have Si + a{yi — x) € dH{yi; x, a). Thus we have that 

g(z; X, a) > H{z] x, a) for all z G R" 

> H(]ji\ x, a) + {si + a{yi — x),z — yi) for alH G / and z G R” 

(by definition of convex subgradient) 

= g{yi]x, a) + {si + a{yi - x),z - yi) (by the restriction property) 

= g{x; X, a) + {si + a{yi — x),z — x) — Ei (by definition of Ei). (39) 

The convex combination of (155)1 with Xi satisfying (155) yields 


g{z;x, a) > g{x;x,a) + ( T] A[si + a{yi - a;)],z - x) - XiEi, for all z G R". 
\ze/ / iei 


(40) 
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By the definition of e—subdifferential in convex analysis, dini) verifies dSOl). 

If 0 G dog{x;x,a), then 0 G dH[x\x,a) since x G int C and dH{y;x,a) = dg{y,x,a) for all y G int C. 
Because H(y'x,a) is a convex function, its stationary points are also global minimizers. Consequently, for any 
z G R", we have g{x]x,a) = H{x;x,a) < H{z;x,a) < g{z;x,a). We note that another simple way to see this 
conclusion is that 0 G dg{x; x, a) = df{x) which is equivalent to x G Pa{x). □ 

Remark, (i) The conclusions in Lemma [Hare very similar to those of the classical bundle methods. The convex 
combination of subgradients, X) ~ ^)]) involved in both situations and is sometimes 

ie7 ze/ _ 

termed an aggregate subgradient. The subgradient aggregation technique was developed in [13] where aggregate 
subgradients together with the convex combinations of linearization errors are used to represent additional vir¬ 
tual cutting planes in the model. Subgradient aggregation technique has many applications including preventing 
unbounded storage caused by too many cutting planes. 

(ii) It is not difficult to see that the model reduction is always nonnegative provided that Ei > 0 for all 
i G I. From the definition of Ei in (07)) . this can be guaranteed by choosing a > which satisfies ([2H). 

(iii) The expression of model reduction in (1^^ can also be stated as 

m(x) - m{x*) = ^ XtEi + A||^ Ai[si -I- a{y^ - x)]||i (41) 

iGl i£l 

with ^*[5* + - *)] = 0 if ||x* - x||oo < A. 

iGl 

Lemma IH implies that the model reduction can somehow help us to determine whether x is a good estimate 
of a stationary point. If / is convex, generally speaking, a good estimate x of a minimizer of / should satisfy that 
both min 11| and /(x) — f{p) are very small, where p is a minimizer of /. In the convex case can be 

gedf{x) 

a lower bound for min ||p||. Motivated by this, in the next lemma we try to relate the model reduction with 

gedf(x) 

the above two approximate measures of a good estimate of a stationary point in the nonconvex case through 
the restricted convexity. We will use the following set which essentially defines the model m{x). 

P '■= {Ui I * G /}, where / := {f G / |3 x G R” such that m(x) = /i(x; x, a, j/i)}. (42) 

Lemma 3. Given a prox-center x G lev^of, a trust region radius A and an index set I containing some i such 
that X = yi. Let m(x) defined in ((24l) be the objective function of problem (1251) . ‘a’ be such that ifi > 0, \/ i G I, 
X* be the first component of an optimal solution of problem (1261) . and F be defined in (14211 . 

If ‘a’ is such that g{x]x,a) is a restriction to set F of H{x), where H{x) is convex on R" and satisfies 
H{x) < g{x) for all x G R". Then 

(a) m{x) is a cutting-plane model of H{x) and satisfies m(x) < H{x), V x G R"; 

(b) if p G Pa(x) and x ^ Paifi), then 

m(x) - m{x*) > [/(x) - eo(x)] min{ n- ^ , 1} > 0. (43) 

II 2 ; Piloo 

Proof, (a) We see that I indexes all the cutting planes that sufficiently define m{x). The model m(x) is 
essentially the pointwise maximum of cutting planes of g(x) generated at bundle points where the value of g 
and H coincide, and hence m(x) is also a cutting-plane model of H(x). Since H(x) is convex, by propositionjH 
it is a lower approximation of H. This finishes the proof of (a). 

(b) The proof of (1151) can be divided into two parts based on the possible positions of p. First, suppose p is 
located in the trust region, i.e. ||x—p||oo < A, which yields min{p^A_—^ 1 } = i_ Xo show (|151) it suffices to show 
m(x) — m(x*) > /(x) — ea(x). By Lemma|H(i), m(x) = f(x), and hence we only need to show m(x*) < ea(x). 
From the optimality of x*, conclusion (a), the fact that E(x) < g(x) andlTHl we have 

m(x*) < m(p) < H{p)< g{p) = ea(x). 

Second, suppose p is outside the trust region, i.e. ||x — p||oo > A, which yields min{ —, 1} = —. To 

show (P)l it suffices to show 

m(x) - m(x*) > (44) 

11 ^ P||oo 
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We consider the point = x + ||-_^|| — (p— x), the intersection point of trust region and the line segment [x,p]. 
By the optimality of x*, the result (a), the convexity of -ff, the fact that H{x) < g{x), (fTSl) and the fact that 
g{x) = f(x), we have 

m(*-) < m(*“) < (ri + (1 - 

A _ A _ 

= IF-n—ea(a:) + (1 - - ^)f{x). 

lk-p||oo ||a;-p||oo 


Then (HU) can be verified using f{x) = m{x). □ 

4.4 Update of the Model 

At the end of a certain iteration (either major or minor), we need to update the model and prepare the data 
for the new LP in next iteration. The update of the model is supposed to improve the model. The update 
includes adding new cutting planes and deleting old cutting planes, i.e. adding and removing points from the 
set ft := {ui \ i G I}. We also take into account the update of convexification parameter a and a™'" when 
considering updating the model, as a is also part of the model. 

In our method, we always add one cutting plane at the end of each iteration. Specifically, at the end of 
a major iteration, we obtain x^~^^ as our new prox-center. A cutting plane will be needed to generate at this 
point, and hence we add to ft so that there exists a. i G I such that yi = x^^^. At the end of a minor 
iteration, we obtain x^’’ which did not yield sufficient reduction of the objective function. A cutting plane is 
supposed to be generated at this point to improve the quality of the model. However x^^ could be very bad in 
the sense that f{x^^) is too far away from f{x^) or even bigger than f{x°). In this case we backtrack along the 
direction — x^ until we find a point whose funciton value is less than some upper bound This is a finite 
process provided that as we will prove in Lemma [S] 

After backtrack, we add the point found into H, and consequently, all our new bundle points, i.e. those that 
are generated in iteration (fc, 1) for some I, will be in levfkf. However the old bundle points, i.e. those generated 
in {k — j,l') for some j and can still be outside lev^kf. At the end of iteration {k,lk), we remove the old 
bundle points whose function values are greater than /(^. Finally, before we enter iteration A: + 1 we move the 
upper bound closer to f{x^~^^) by setting G- + (1 — Q^s)/^ with some 03 G (0,1). 

We see the bundle point set ft is updated dynamically so that at any iteration (fc, Z), ft C lev^fc- 2 /. This 
setting is related to the convexification process. In Lemma[2]the value of parameter a such that g is a restriction 
of a convex function depends on the set F C ft. The following lemma states the existence of such value. 

Lemma 4. Suppose that Assumption]^ holds. Given a prox-center x G lev^of and some /„ G {f{x),f{x^)], 
consider the model m{x) defined in (|24D with bundle points yi G ft satisfying yi G levj^f for all i G I. Let D be a 
compact set such that D f) F and int D %. There exists a threshold a*^{x, I) > 0 such that for all a > a^^{x, I), 
g{y]x,a) is the restriction to D of a globally convex function H(y;x,a) satisfying g{y;x,a) > F{{y;x,a) for all 
y G IR" and x G D. 

Proof. This lemma is an extension of Theorem [5] Since yi G lev/„/ for all i G I and /„ G {f{x),f{x^)\ 
we have by dUl) that F C leVj;of. As F is a finite set and D is the smallest compact set containing F we 
have D C lev^of. From Corollary [T] and the poof of Theorem [2] in the appendix, we can see that the same 
conclusion holds true when we replace lev^of by any of its compact subset that has nonempty interior. In our 
case each (a;, I) corresponds to a set D C lev^of and for each D there exists a threshold af^ix, I) satisfying the 
corresponding conditions. □ 

From Lemma 0] we see that the condition for ‘a’ in Lemma [3] can be satisfied if we take a > a*^{x,I). 

4.5 The LPBNC Algorithm 

For the trust region update we follow the procedure [T] In our algorithm in order to distinguish the prox-center 
we differentiate major iterations and minor iterations. When we set a new prox-center x^'^^ it is also the last 
minor iteration point denoted by x^^*‘ . As we have an infinite sequence of iterations, the following two situations 
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can happen. First, there are infinite number of major iterations with each loop of minor iteration to be finite. 
The sequence can be described as follows: 


^00 


^01 




.10 


.fco 






(= 


(45a) 


Second, there are finite number of major iterations with the last major iteration containing infinite minor 
iterations, which can be described as: 


.00 


.01 




„fe 0 


„fel 


ki Mi+1) 


(SSb) 


A set of similar notations goes for the sequences of parameters (a, A). Starting from (og, Ag), each pair (af, Af) 
is used to produce x^’", and if / = we say x^^’‘ was produced by (af^, Af^). To alleviate notation we drop the 
subscripts of (af, Af) in the Algorithm [3] below and also in later analysis we define rnf{x) '■= m{x; a;^, af,I{k, 1)) 
for all k and I and note the value of mf{x) is dependent on x^,af and I{k, 1). 
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Algorithm 3: LPBNC 


1 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 


Data: Final accuracy tolerance ctoi, maximum trust region radius Amax, initial trust region 

Aq G ( 0 , Amax), initial point trust region parameters 0 < r/i < 7^3 < 1 and 0 < oi < 1 < 02 , 
backtrack parameter (3 G (0,1)? parameter a G M>i and increasing parameter for convexification 
parameter j G [2,10]- 

Initialization major and minor iteration counter (A:, 1) G- (0, 0), initial convexification parameter a 0, 
^mm ^ generate a cutting plane of g{y; a) at prepare the information for the 

first LP, ^ /(a;°); 

solve the linear programming subproblem (12611 with x, I replaced by I{k, 1) and obtain an optimal 
solution 

if f{x^) - < (1 + |/(x'=)|)etoi then 

I STOP; 
end 


if 

“ — f{xk)-zkl 


> 771 then 

Ik I', 


xk+1 ^ ^kl 


serious ■<— 1, 

/* update trust region radius for major iteration 
if Pi > 773 and ||x^* — x^||oo > 0.9A then 
I A ^ min{a 2 A, Amax} 

end 

for all 777 G n if 77 ; ^ levffc/, delete w from fl and update the index set I(k, 1) by deleting the index 
whose corresponding cutting plane is generated at w ; 

^ asfix’^G + (1 - a3)fu 




else 

serious ^ 0; 

/* update trust region radius for minor iteration */ 

if Pi < - inin{\.A} 

I A ■<— aiA 

end 

if fc > 0 and f{x^^) > /* then // backtrack 
d ^ x^^ — x*;j ■(— 1; 
while /(x^ + /3^d) > do 

I J ^ J + 1 

end 

y ■(— x^ + 13^ d 

end 

end 

add in major iteration, or add x^\ or y if backtrack was performed, as a new bundle point 7 /i into 
fl, update according to its definition in ( 1 ^ : 
if a < a™'" then 
I a •<—max {a“™, 7 a} 
else if > 0 and a > era™'" then 
I a^(a + a ™™)/2 
end 

if serious then 

generate a cutting plane of 77 ( 77 ; x^+^, a) at x^+^, and add the cutting-plane function to the model. 
(The model becomes ^ and the cutting plane index set becomes I{k+ 1, 1).) fc ^ /c -|- 1, Z ^ 0; 
else 

generate a cutting plane of g{y,x^,a) at the new bundle point, and add the cutting-plane function to 
the model. (The model becomes and the cutting plane index set becomes I{k,l + 1).) 

Z ^ Z -I- 1 


end 

update the coefficient matrix in LP subproblem (1^51) : 
continue to next iteration by going to line [5] 
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Remark. 1. The value of a used in the LP subproblem (l26l) can always guarantee Ei >0 for all i G I{k, 1). 

2. We set as if pf > pi. This implies < f{x^) and thus G lev^-o/ for all k. 

3. At the beginning of iteration k, a cutting plane of g{-; a) is generated at x^. From the update of we 
see that f(x^) < for all k. At the end of iteration k, will not be deleted from fl and consequently 
x^ is always indexed in J(fc, 1) for all 0 < Z < Ik- 


4.6 LPBNC is well-defined 

The following lemma shows that if x** is not located in levfkf then after finite backtrack along the direction 
^ki _ ^ reach an auxiliary point in \evfkf. 

Lemma 5. If at iteration {k,l) we have /(x^*) > and k > 0, then there exists an integer j' such that 
y = x^ + d and y G with d = x^^ — x^. 

Proof. From the algorithm we see /° = /(x°) > f{x^) and = a 3 /(x^) + (1 — ct 3 )fu~^ ii k > 0. Hence, 
fu ~ /(2^*) > 0 foi' s-ll k > 0. Suppose for contradiction that /(x^ + fi^d) > f^ for all integer j > 0. 
Hence, 0 < /^ — /(x^) < /(x^ + /3^d) — f{x^) < ld^\\d\\acL where L is the Lipschitz constant of /. Therefore 

rfe f(x^^ 

> '''||~|| ^•^ ;= Cl- Since p G (0,1) clearly this cannot be true for all j > 0; a contradiction. □ 

We want to show that the algorithm LPBNC is well defined by showing that the inner loop (loop of minor 
iterations) can terminate finitely and that if it does not terminate finitely, then we have already found a 
stationary point of /. 

Lemma 6. Suppose pf < pi for some k, 1. Then 


Proof. At the end of iteration k, I, we do not delete any cutting planes but add a new cutting plane to the model. 
Furthermore, the trust region radius A is possibly decreased. Hence the feasible region of linear subproblem 
(l26)l for iteration fc, / + 1 will become smaller. Therefore we have > mf (x^'*). □ 

Define 

L = sup{||s||i : s G 5/(j/), ||?/-x||oo < Amax, a;G lev^-o/} (46) 

From the Lipschitz continuity and prox-regularity of / we have L < +oo. 

The notion of minor iteration is similar to the null step in bundle methods. The following lemma shows 
that minor iterations either terminate finitely or generate an infinite sequence with very small model reduction. 
Furthermore, as we can see from below, the model reduction eventually decreases to 0 if minor iterations do not 
terminate finitely. If / is convex we can easily see that this will show that the current iteration point is already 
a global minimizer of /. For the nonconvex case, we will show that during the infinite minor iterations, if the 
convexification succeeds, i.e. the function g is eventually convex locally around x^, then x^ is a stationary point 
of/. 

Lemma 7. Suppose that etoi = 0 and of is bounded above by a constant A for all k and 1. Let be any index 

such that < rji. Then there is an index I 2 > h and a real number p 2 G (pi, 1) such that either > pi or 

fjx'^) - ^ _ 

/(x^) — 

Proof. Suppose for contradiction that there does not exist such index I 2 and real number rp] that is, there is 
an infinite sequence of minor iterations and 


/(x'=)-TO^(x'=«) 

f{x^) — mf^(x''*i) 


V g > fi, V p2 G (pi, 1). 


(48) 


Since can be any index such that pf^ < pi, and we do not delete cutting planes in minor iterations, we can 
assume that q and I are generic indices satisfying q > I > li. 
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To construct a contradiction, write /(x^) — mg{x) = [/(x^) — mj(x^*)] + [mq(x^*) — to^(x)]. Consider the 
two parts of the right hand side of this equation. First, observing that x^* is the point where a new cutting 
plane of g{y; x^,af) is generated and all the cutting planes in minor iterations are kept in the model, it follows 
that 5 (x^*;x^,a*) < m^{x^^). Consequently, 

/(x'=) - mj(x'=') < /(x'^) - 5 (x'='; x^ <) = f{x^) - 


|x'=' -x'^lP 


< /(x'^) - /(x'^') < m [/(x'^) - mf (x'^')] (becasue pf < m) 

< rji [/(x^) — TO* (x^^^)] (by Lemma H]). 


(49) 


Second, the model function to* (x) is convex. Therefore for all s G 9to* (x** ), 


TO*(x) — TO*(x**) > s^(x — X**) V X. (50) 

Note s is a subgradient of to* at x** where a cutting plane of p(x**;x*,a*) was generated, and thus s G 
9p(x**; X*, a*) = 9/(x**) + a*(x** — x*). It follows from (HSl) . (I26cl) and the boundedness of af that ||s||i < 
L + ||a*(x** — x*)||i < L + AnAmax- Applying this to (|5(I1) we have 

TO*(X*') - TO*(x) < ||s||i||x - X*'||oo < (T + AnAmax)||x - X*'||oo V X. (51) 

Summing and (l?T|) . there is 

/(x*) - TO*(x) < pi[/(x*) - to(;(x*'^)] + (L + AnAi„ax)||x - x*'||oo V X. (52) 


It then follows from (l4^ and (l52ll by taking x as x*'^ that 

m [/(x*) - TO* (x*'^)] < r?i [/(x*) - TO* (x*'^)] + (L + An A 


maxy||x*«-X*'| 


\/ q > li. 


Thus 




— X 


I 


> 


P2 - m 


[/(x*)-TOr,(x*'^)] :=C2>0. 


(T + AnAjnax) 


(53) 


However this cannot happen for an infinite number of indices q and I because all minor iteration points x** such 
that I > li are in the neighborhood of x*, H(x*, A^ax) = {x | ||x — x*||oo A Amax}- Hence a contradiction is 
found and (|T7l) must be true for some I 2 > h and fj 2 > Vi- □ 


It is worthy to mention that the proof of lemma [5] and [7] does not require the convexity of /. So far we have 
shown that the minor iterations either terminate finitely or continue infinitely. To demonstrate our algorithm 
is well defined, we need to show in the latter case that the current major iteration point is a stationary point 
of / and that af is indeed bounded above. We show this in the next section. 


5 Convergence analysis 


Theorem 3. Let Assumption 1 hold and Ctoi = 0. Suppose Algorithm terminates at iteration {k,l). If 
af > af^{x^,I{k,l)) then x* G Pg^k{x^) and 0 G cl/(x*). 

Proof. As the algorithm terminates at x*^ x** must satisfy the stopping criterion /(x*) — z** < (1 + |/(x*)|)etoi- 
As etoi = 0, we have /(x*) — z** < 0. However by the expression of model reduction in (l29ll . /(x*) — z** > 0. 
Thus /(x*) — z** = 0. Suppose for contradiction x* ^ P^k{x^). By (1^ . Lemma 01 and Lemma 01 if af > 
af^{x^,I{k,l)) and p G Pg^k{x^), then we have 


0 = /(x*)-z*'> [/(x*)-e,.(x*) 

From the definition of Pgk{x^) B p we have 

/(a;'") - e„Ax*) = 5 (x*;x 


A* 


1 > 0 . 


Ik* - pile 

,af) - q{p;x'^,af) > 0, if x* ^ P^Ax*). 


(54) 


(55) 


From (IMl) and (1551) we see 0 = min | —j l|- Thus 0 = — with ||x* — p||oo > Af. But Af cannot 

be reduced to 0 after finite iterations. Hence we have a contradiction. □ 
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Let etoi be 0. In the following analysis we assume the algorithm does not stop finitely. We see in (|45l) that 
two sequences can be generated. Consistent notations should be used for a and a™'". For clear understanding, 
we unify those two cases with {a"} and {a™'"} when it’s not necessary to distinguish them. We start our 
convergence analysis by showing that the convexification parameter a in algorithm LPBNC is bounded. 

Lemma 8. 0 < a" < V n S N. 

Proof. The update of a" happens only in line HH] or line 1301 of Algorithm |31 In either case we have a" > a™'" > 0 
from the definition of a™™. We increase a in line [33] and decrease a in line [301 To show o" < ya*^ we only 
need to show max{a™“,yo"} < ya*^ if a" < a™'". As y > 2 and o™'" < for all n G N, we clearly have 
max{ , ya" } < y< ya‘^. □ 

A consequence of lemma [7] is that if the minor iteration sequence does not terminate finitely then the model 
reduction will become smaller and smaller and eventually converge to 0. We will show that if there is an infinite 
sequence of serious steps, the model reduction will converge to 0 too. Denote the index of the last minor 
iteration as Ik so that . 

Lemma 9. The model reduction of LPBNC converges to 0. Specifically, 

(i) if in iteration k there is an infinite sequence of minor iterations then 

lim [mf (x'^) - mf (x'^')] = 0; (56) 

l—¥OC 

(ii) if the sequence of major iteration points {x^} is infinite then 

Jhn [mf^ (x'^) - ml (x'=+i)] = 0. (57) 

Proof, (i) From Lemma[3]we know mf(x^) = /(x^) for all {k,l) and the sequence {/(x^) — m('(x^^)}“j^ is non¬ 
negative. From Lemmainiwe see this sequence is also monotonic. Since the sequence is infinite, by Lemma[7]there 
is an infinite sequence of indices 0 < < Z 2 < ■ ■ • such that 0 < /(x^) — (x^b') < fj.^ [/(x^) — (x^b-i)] < 

• • • < fji~^[f{x^) — mf^(x^b)] where j can be infinitely large. Consequently, (1551) holds true. 

(ii) From Remark 1351(7]) we see the sequence {/(x^)}^q is monotonic. Under Assumption [U / is bounded 
below. Hence lim [/(x^) — /(x^+^)] = 0. 

k—¥oo 

From the definition of and LemmalU we have /(x^) — /(x^b) > rji[f{x^) — = (mf^(x^) — mf^(x^*'‘)). 

Since both {/(x^) — /(x^'*'^)} and {m^ (x^) — (x^’*'^)} are nonnegative we must have lim [m^ (x^) — 

to); (x'^b)] = 0 . □ 

Let L be the Lipschitz constant of /. We are now ready to prove the convergence theorem of LPBNC under 
Assumption 1. 

Theorem 4. Let Assumption 1 hold and Ctoi = 0. Suppose Algorithm\^ generates an infinite number of minor 
iterations after the k-th major iteration. For every infinite subsequence 1C C fj, if BiJC) := {Z G /C | of < 
af^{x^,I(k,l))} is a finite set, then 

(i) there exists {pi}i^K such that pi G P^k{x^) and ||p/ — x^||oo ^ 0; 

(ii) 0 G df{x^). 

Proof, (i) Let /C C N be an infinite subsequence and B{IC) be a finite set, then there exists Ni G 1C such that 
af > af‘^{x^,L{k, 1)) for all I > Ni and I G 1C. By LemmajU af satisfies the required conditions in Lemma[3]for 
all I > Ni and I G JC. Suppose for contradiction that for all sequences {piji^ic such that pi G P^s(x^), there 

exist e > 0 and N 2 G JC such that ||pi — x^||oo_> e for all Z > N 2 and Z G 1C. Then conclusion (H3)) can be applied 
with (x, X*, a. A, p, I) replaced by (x^, x^’^ of, Af, pi, l(k,l)) for all Z > A 3 := max{Ai, A 2 } and Z G JC. 
For simplicity of notation we drop the superscript k and set (x^, o;. A/, pi) := (x^’*, af, Af, pf). We have 

m{x^) - to(x') > [fix'') - ea,(x*’)]min{ , 1} > 0, V Z G IC>n,,. (58) 

11^ Pl\\oo 

From Lemma [3Ki) we have mix'') — m(x*) —>■ 0 as Z —>• 00 . Consequently, 

[/(x'^) - ea,(x'^)]min{ , 1} ^ 0- (59) 

Wx" -piWoo 
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We first show that 


(60) 


Ifix^) - ea,(a;'=)] ^ 0. 

Suppose for contradiction that [f{x^) — Cai (x^)] ^ 0. By LemmalSl {ai}i^ic is bounded, and hence there exist a* 
and 1C C K. such that ai a*. As the mapping e(.)(a:^) is continuous, we have f{x^) = ea*{x^) and equivalently 
x^ G Pa*{x^). From the outer semicontinuity of the mapping there exist e' < e and N 4 G iC such that 

min _ \\x^ — pIIoo < for all I G IC>n 4 - However, this cannot be true because e' < e, ^ C /C and we have 

P&Pai (a:*’) 

supposed that \\pi — a;^||oo > e for all N 2 < I G JC, where pi can be an arbitrary element of Pai{x^)- We have 
finished showing ioi) which together with (|59ll yields 


A; 

\\x’^ -PlWoc 


0 . 


(61) 


Next we show 

{||a;^ - PilloolzeK is bounded. (62) 

By Definition 12.41 f{pi) < eai{x^) < fix^), and therefore pi G lev^kf C lev^-o/. We also have x^ G lev^of 
which is bounded. Hence {||a;^ — Pi\\oo}ieic is bounded above. We have supposed that \\pi — x^Hoo > e for all 
N 2 < I G JC, hence {\\x^ — Pi\\oc}i&K is bounded below and (1621) is true. From (IbTll and (l62|) we have 


Ai4o. 


(63) 


Then line [16] of Algorithm [3] must be executed infinite times, which implies that there exists an infinite subse¬ 
quence /C* C N such that pi < — for all I G JC* and JC' := /C fl JC* is infinite. By the definition of pi, 

the Lipschitz continuity of /, and the feasibility of x^ to problem (1261) we have 

m{x^) — m{x‘) < [f{x^) — fix'")] minjA;, 1} < — a;*||oo min{Ai, 1} < LAi^, V I G JC'. (64) 

By (IS51) there exists N 5 G JC>n 2 such that A; < e < ||p;—x*^||oo for all I G JC>Ns- This implies min{ —, 1} = 

11^ Pi II 00 

I, , 1 — and from (1^ we have 

Iloo ’—’ 


f{x^) - eai{x'^) < 


[m{x^) — m{x^ 


Wx'^ -Pi\ 


< i|k"-Pi||ooA,, \f lG{JC>N,nJC'), 


(65) 


where the last inequality follows from (IMll and Nq := max{A^ 3 , A^s}. From (IH^ . (IH!?!) . (1^ and the fact that 
JC' C JC we have 

[/(a;'') - ea,(a;'=)] ^ 0. (66) 

In (IHHI) and its poof the JC can be replaced by any infinite subsequence jC such that ||pi — a;^||oo > e for all 
I G JC>n 2 with Pi G Pai(x^). Consequently, (I66p cannot be true and we have found a contradiction. 

(ii) Finally, to see 0 G df{x^), note ai{x^ — Pi) G df{pi) for all pi G Pai{x^) and I G JC. By the outer 
semicontinuity of the proximal mapping and subdifferential, when ||p/ — a;^||oo 0, as long as {a/};gjc is 

bounded which is true from Lemma [H we have 0 G df{x^). □ 

We denote the minor iterations between the fc-th major iteration and the (/c-|-l)-th major iteration M(fc) := 
{0,1, • • • , lk\ with /fe = 0 if there is no minor iteration in between. We will need the following assumption. 

Assumption 2. If tJiere exists a sequence of indices {jk}keK witJi jk G M{Jt) sucJi that G '■= {J; G < 

— mintA*" 1 } I ™ infinite set then {fc G G|o^^ < af^{x^,I{Ji,jk))} is a finite set. 

Theorem 5. Let Assumption 1 and Assumption 2 hold and Ctoi = 0. Suppose Algorithmic generates an infinite 
number of major iterations. For every subsequence /C C N and x such that x^ ^ x, there exists an associated 
sequence of indices {ik}k&ic with ik G such that if EfiJC) ;= {fc G JC\a\^ < af^{x^,I{k,ik))} is a finite set, 

then 

(i) there exists {p^jk^K. with p''^ G {x^) such that ||a;^ — P^||oo ^ 0; 

(ii) 0 G i9/(i). 
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Proof, (i) Let /C C N be an infinite subsequence such that x. Suppose for contradiction that for all 

sequences and {ik}k^K with ik G M{k), G P^k {x^) and E{1C) finite, there exist e > 0 and Mi G /C 

such that ||a;^ — p^||oo > e for all k G /C>Mi- We first show that 

[f{x^)-e,k ix^)]^0. (67) 


Suppose for contradiction that [f{x'^) — e^k (x^)] ^ 0. By Lemma [H {a’lAk^K is bounded, and hence there 

_ 'k “ 

exist a* and JC C JC such that af) —>■ a*. As both /(•) and e(.)(-) are continuous, we have f{x) = ea*{x) and 

equivalently x G Pa*{x). From the outer semicontinuity of the mapping P(.)(-), there exist e' < e and M 2 G 1C 

such that min ||a;^ — p||oo < for all k G 1C>m2- However, this cannot be true because e' < e, iC C 1C and 
peP^k {x’’) 

we have supposed that ||a;^ — p^||oo > e for all k G /C>Mi, where can be an arbitrary element of P^^k (x^). 

^k 

We have finished showing (1571) . 

Take ik = h for all k € 1C. Since E{IC) is a finite set, there exists M 3 G K, such that J(fc, Zfc)) 

for all k G /C>m 3 - By Lemma IH satisfies the required conditions in Lemma [3] for all k G /C>m 3 - Then 
conclusion (H31) can be applied with (a;, x*, a, A, p, I) replaced by af^, , p^, I{k,lk)) for all 

k G /C>m 4 where M 4 := max{Mi,M 3 }, i.e. 


m{x’^) - > [f{x^) - e^k (x^)] min{- 


Af 

l^k 


2;fc _ pfc| 


-, 1} > 0, V fc G /C>m4- 


From Lemma [5Kii) we have m{x^) 


m{x^~^^) —>■ 0 as A: —j> 00 . Consequently, 




[/(* )-e..yx 


0 . 


( 68 ) 


From dSIl) and ( 1681 ) we have 


Next we show 


Af 


\xk _ pfe| 


0 . 


\x’" -p^lloojfegK: is bounded. 


(69) 

(70) 


By Definition 12.41 fijp’^) < e^k (x^) < f{x^), and therefore p^ G lev^.^/ C lev^-o/. We also have x^ G lev 2 ,o/ 

‘k 

which is bounded. Hence {\\x^ — P^HoojfeGK: is bounded above. We have supposed that \\x^ — P^||oo > e for all 
k G /C>Mi, hence {||a:* — P^||oo}fceA; is bounded below and (|7n|) is true. From (1551) and (1751) we have 


K ^ 0- (71) 

Then line ITHl of Algorithm [3] must be executed infinite times, which implies that there exists a subsequence 
/C* C N such that IC' ■= IC D IC* is infinite and for all k G IC*, with jk G M{k). By the 

definition of , the Lipschitz continuity of /, and the feasibility of to problem (1351) we have 

mik ( 2 ^*) “ ''^jk ) < [/(^^) “ fix '"'^'‘)] min{Aj\, 1} 

< lux'" -x'"’'^''||ooniin{A^^,l} 

< V fc G /C', (72) 

where := m{-,x^,a'f^,I{k,jk)). 

We now show A^^ ^ 0. Suppose for contradiction that {A^^}k^K' is bounded away from 0. From m and 
the fact that IC' C /C, there exists M 5 G IC' such that 

Af^ < A^\+\ forallfcG/C'>M3. (73) 
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From the update of trust region in Algorithm|3]we see that in minor iterations trust region radius is not increased 
and in major iterations trust region radius is increased under some conditions. Thus (1751) implies that 


< Ag+i = min{a2At A„,J < a2Af^ for all k e 


(74) 


From (1711) . /C' C 1C, (1751) and (I75)) . we have 


^ 0- 


(75) 


We have finished showing A* ^ 0 by reductio ad absurdum. Now for each jk there exists G {x^). If 


3 k 


3k 


\\P^k~^^\\°° ^ ® have found a sequence satisfying conclusion (i). Thus we suppose for contradiction that 

k^iC' is bounded away from 0. Consequently, there exists Mq G K' such that A^ < e < \\p^ —a;^ ||oo 


A*’ 


. 1 } = 




for all k G 1C'>Ms- Under Assumption 2, 


for all k G }C'>Me- This implies min{-j 

- Ii- ‘-jfcii'-' II- '-jfcii'- 

{k G G|a*^ < ,I{k,jk))} is a finite set, and therefore there exists G 1C' such that Lemma[3]can be 

applied with (x, x*, a, A, p, I) replaced by (x^, x^’l^, a* , A^ , , I{k,jk)) for all k G /C'>m 7, i-e. 


jfc’ — ]k' 


fi.x )-e^k {x )< 

3k 


A'^ 

Ok 


V fc G lC'>Ma ) 


(76) 


where Mg := max{M6,M7}. From (1751) and (1751) we have 

/(x'=)-e„.^(x'=)<Z||x'=-p^J|ooA^,, VfcG/C'>Ms. (77) 

The p^ in (1701) and its proof can be replaced by pj^ and thus {||x^ — PjJ\oo}k£K' is bounded. This together 
with (1751) and (1771) yields /(x^) — e„k (x^) ^ 0. We can easily check that the 1C in (1571) and its proof can be 

Ok 

replaced by 1C' with ik replaced by jk and p^ replaced by p^ . Hence we have /(x^) — e k (x^) 0. We have 

found a contradiction and conclusion (i) holds true. 

(ii) To see 0 G df{x), note (x^—p^) G df{p^) for allp^ G P„fc (x^) and k G 1C. By the outer semicontinuity 

*k 

of the proximal mapping and subdifferential, when x^ —>■ x and ||p^ — x^||oo —0, as long as {aJ^j,}fceA: is bounded 
which is true from Lemma ISl we have 0 G df{x). □ 


6 Numerical experiments 

In this section we report some preliminary numerical results on implementations of LPBC and LPBNC. Here our 
goal is to provide a proof of principle only. For nonconvex examples we demonstrate that the conditions stated 
in the convergence theorems can be satisfied. The two algorithms were programmed in MATLAB R2012b in a 
computer with 3.40 GHz CPU and 8 GB RAM. We used the CPLEX connector (VI2.5.I) for MATLAB to solve 
the linear subproblems. Specifically, problem (jH]) was programmed and solved by the CPLEX Class API and 
problem (1751) was solved by the toolbox function cpiexip. CPLEX automatically chooses from primal simplex, 
dual simplex and barrier optimizers to solve a given linear programming problem. In our implementations we 
found that no instances of the linear subproblems were solved by barrier optimizer. The implementation of 
our algorithms requires high accuracy of the solution of linear subproblems. Hence the CPLEX tolerances of 
optimality and feasibility are crucial to the performance of LPBC and LPBNC. As we see from the algorithms, 
both the stopping criterion and definition of p are dependent on the model reduction, /(x) — z*] if z* provided 
by CPLEX solver is slightly bigger than the actual optimal value of (I26L then the model reduction can be 
significantly inaccurate. In fact, we observed instances of negative model reduction when we use the default 
tolerances of optimality and feasibility in CPLEX. To prevent the occurrence of such cases, we set both the 
optimality and feasibility tolerances to I0“®, the least value in CPLEX. 

In our experiments, the choice of the initial trust region radius Aq can significantly change the performance 
of our algorithm on some problems. In most trust region methods, choosing the initial trust region radius is an 
import issue, as stated in the monograph |51 page 784] “one very often has to resort to some heuristic to choose Ag 
on the basis of other initial information.” Nonetheless, [3] suggested several strategies of initializing trust region 
and we adopted the following two choices, Aq = 1 and Ag = j;^||s°||, where is an element of 5/(x°). Settings 
for other trust region related parameters in both LPBC and LPBNC are pi = 10“'*, 773 = 0.4, ai = 0.25, a 2 = 2, 
and Aniax = 1000. 
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6.1 Convex Examples 

Table [5] presents the results for the tested convex problems listed in Table [1] where problems 1 to 15 are taken 
from section 3 of [TB] and problems 16 to 20 are the problems 2.1 to 2.5 of m- For all convex and nonconvex 
problems in our tests we use the initial points provided in the associated references. The following abbreviations 


Table 1: Tested convex problems 


No. 

Problem 

Dimension 

Optimal Value 

1 

CB2 

2 

1.9522245 

2 

CB3 

2 

2 

3 

DEM 

2 

-3 

4 

QL 

2 

7.2 

5 

LQ 

2 

-1.4142136 

6 

Mifflin 1 

2 

-1 

7 

Wolfe 

2 

-8 

8 

Rosen 

4 

-44 

9 

Shor 

5 

22.600162 

10 

Maxquad 

10 

-0.8414083 

11 

Maxq 

20 

0 

12 

Maxi 

20 

0 

13 

Coffin 

50 

0 

14 

MXHILB 

50 

0 

15 

LIHILB 

50 

0 

16 

Generalization of MAXQ 

100 

0 

17 

Generalization of MXHILB 

100 

0 

18 

Ghained LQ 

100 

-99^2 

19 

Ghained CB3 1 

100 

198 

20 

Ghained CB3 11 

100 

198 


are used in Table [5] 

fyai minimal function value returned by the algorithm, 
nf number of function evaluations used by the algorithm, 

k number of major iterations, 

L number of minor iterations, 

time elapsed CPU time, 

t-CPX sum of CPU time by CPLEX solver, 

A final value of trust region radius, 

sh number of times that trust region radius is decreased, 

pr number of times that primal simplex method is chosen by CPLEX, 

dual number of times that dual simplex method is chosen by CPLEX. 

In LPBC, we set etoi = 10“® and T = 30. Eor problems 1 - 14 we initialize trust region radius by 1; for 
problems 15 - 20 we initialize trust region radius by i^||s°||. Erom Table[2]we see that LPBC returned optimal 
values to accuracy 10“® for 14 problems, 10“"^ for 4 problems and 2 x 10“"^ for 2 problems. For all problems 
except the last three, CPLEX consumed negligible time to solve all the linear subproblems. The possibility that 
trust region is shrunk (the value sh/L) is very small for the majority of the tested problems. We also see that 
most of the linear subproblems were solved by dual simplex method. 

6.2 Nonconvex Examples 

The tested nonconvex problems are listed in Table [3] where problems 1 to 7 are taken from section 3 of [TB] and 
problems 8 to 12 are the problems 2.6 to 2.10 in m- 

In LPBNC, we set 7 = 2 and etoi = 10 Apart from the initial trust region radius Aq, another parameter 
that can cause dramatic changes in the performance of LPBNC is the backtrack parameter /3. Results with 
two settings of these parameters are listed in Table S] and [S] below. We use the same abbreviations as in Table 
[3) Additionally, we list the difference of fvai and optimal value (error), the number of function evaluations 
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Table 2: Results for convex problems 


No. 

fval 

nf 

k 

L 

time 

t-CPX 

A 

sh 

pr 

dual 

1 

1.952225451 

16 

10 

5 

0.1428 

0 

0.341842126 

0 

0 

16 

2 

2 

3 

1 

1 

0.0977 

0 

0.5 

1 

0 

3 

3 

-3 

8 

6 

1 

0.1097 

0 

2.039607805 

0 

2 

6 

4 

7.20000069 

16 

10 

5 

0.1322 

0 

0.141421356 

2 

0 

16 

5 

-1.41421274 

18 

14 

3 

0.1305 

0 

1.13137085 

0 

3 

15 

6 

-0.999999683 

28 

12 

15 

0.1562 

0 

0.03125 

6 

5 

23 

7 

-8 

5 

3 

1 

0.1021 

0 

2 

1 

0 

5 

8 

-43.99998585 

54 

24 

29 

0.2361 

0 

0.25 

4 

1 

53 

9 

22.60018019 

55 

22 

32 

0.2356 

0 

0.25 

1 

0 

55 

10 

-0.841407474 

220 

34 

185 

0.7292 

0 

0.25 

1 

0 

220 

11 

4.06847E-07 

249 

116 

132 

0.8042 

0 

2 

3 

0 

249 

12 

0 

36 

16 

19 

0.1735 

0 

3.2 

2 

7 

29 

13 

0 

51 

48 

2 

0.2316 

0.016 

4 

0 

1 

50 

14 

2.35525E-07 

15 

10 

4 

0.1290 

0 

8.158764414 

2 

3 

12 

15 

2.08721E-06 

27 

15 

11 

0.2028 

0 

0.13964447 

6 

1 

26 

16 

4.21692E-07 

1361 

583 

777 

4.5585 

0.079 

40 

2 

1 

1360 

17 

9.97664E-07 

25 

15 

9 

0.1571 

0 

2.045863824 

5 

3 

22 

18 

-140.0070287 

1185 

91 

1093 

8.4852 

3.661 

0.124058958 

3 

0 

1185 

19 

198.000171 

1437 

132 

1304 

10.4948 

5.109 

0.13978045 

5 

2 

1435 

20 

198.0000905 

35612 

199 

35412 

285.4227 

150.166 

0.13978045 

5 

2 

35610 


Table 3: Tested nonconvex problems 


No. 

Problem 

Dimension 

Optimal Value 

1 

Crescent 

2 

0 

2 

Mifflin2 

2 

-1 

3 

Colville 1 

5 

-32.348679 

4 

HS78 

5 

-2.9197004 

5 

El-Attar 

6 

0.5598131 

6 

Gill 

10 

9.7857721 

7 

Steiner 2 

12 

16.703838 

8 

Active Faces 

50 

0 

9 

Brown 2 

50 

0 

10 

Chained Mifflin2 

50 

-34.795 

11 

Chained Crescent 1 

50 

0 

12 

Chained Crescent 11 

50 

0 
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in the backtrack process divided by the total number of function evaluations (pb), the number of subgradient 
evaluations (se), the final value for a, the final value for and the number of times that a is updated (au). 
Subgradient evaluations do not happen in the backtrack process and hence the total number of subgradient 
evaluations is not equal to that of function evaluations. We found that all instances of LP subproblems were 
solved by dual simplex method in our tests for nonconvex problems. We see in Table 0] problem 8 and 9 have 




Table 4: Results for 

nonconvex problems with A® = 

1, ^ = 0.7 





No. 

error 

nf 

pb 

se 

k 

L 

time 

t-CPX 

A 

sh 

a 

^min 

au 

1 

0.000257 

46 

30.43 

32 

19 

12 

1.186 

0 

0.0625 

3 

0.90002 

0.89339 

3 

2 

0 

20 

10.00 

18 

4 

13 

0.260 

0 

0.03125 

4 

0 

0 

0 

3 

6.28E-07 

46 

19.57 

37 

14 

22 

0.561 

0 

0.125 

3 

0 

0 

0 

4 

6.5E-05 

81 

2.47 

79 

30 

48 

0.940 

0 

0.125 

4 

50.37528 

48.16938 

5 

5 

9.25E-07 

99 

9.09 

90 

35 

54 

1.214 

0.031 

0.125 

8 

0.40795 

0.39540 

9 

6 

0.0002196 

205 

1.95 

201 

30 

170 

4.602 

0.076 

0.015625 

4 

6.769637 

6.769637 

5 

7 

0.0001409 

123 

16.26 

103 

36 

66 

1.116 

0.03 

0.03125 

3 

0 

0 

0 

8 

0 

2 

0.00 

2 

1 

0 

0.278 

0 

2 

0 

0.118057 

0.118057 

1 

9 

0 

2 

0.00 

2 

1 

0 

0.121 

0 

2 

0 

0 

0 

0 

10 

4.05E-05 

11549 

69.81 

3487 

85 

3401 

345.646 

16.931 

0.003906 

5 

0 

0 

0 

11 

7.29E-06 

5743 

38.22 

3548 

189 

3358 

126.861 

9.896 

0.003906 

5 

0 

0 

0 

12 

l.OlE-05 

1637 

54.31 

748 

72 

675 

15.801 

1.199 

0.000977 

9 

0 

0 

0 


error 0 with two function evaluations. These two instances are accidental as we found that the optimal solution 
of the first linear subproblem (I26|) is already the minimizer of the objective function for these two problems 
given that we set Aq = 1 and we use the default initial points. We note that a huge negative error occurred 


Table 5: Results for nonconvex problems with Aq = jh||sO||^ ^ q g 


No. 

error 

nf 

pb 

se 

k 

L 

time 

t-CPX 

A 

sh 

a 

^min 

au 

1 

0.0002435 

40 

37.50 

25 

19 

5 

0.311 

0 

0.10607 

3 

0.375 

0.363 

3 

2 

1.095E-05 

20 

5.00 

19 

6 

12 

0.247 

0.016 

0.03542 

4 

0 

0 

0 

3 

0.0001802 

49 

28.57 

35 

15 

19 

0.390 

0 

0.02140 

6 

0 

0 

0 

4 

-6.636E+19 

15 

0.00 

15 

14 

0 

0.215 

0 

1000 

0 

3.78E-H12 

2.81E-H12 

13 

5 

1.587E-05 

437 

48.97 

223 

105 

117 

2.738 

0.015 

0.13473 

28 

0.196 

0.163 

49 

6 

0.0002527 

458 

26.64 

336 

75 

260 

4.295 

0.11 

0.04621 

6 

7.995 

0 

15 

7 

0.0001122 

118 

14.41 

101 

33 

67 

1.071 

0.03 

0.06833 

3 

0 

0 

0 

8 

0.0003365 

1028 

24.42 

777 

73 

703 

18.732 

1.076 

0.00693 

16 

0.027472 

0.027465 

46 

9 

3.334E-06 

657 

34.70 

429 

36 

392 

9.414 

0.712 

0.00136 

12 

0.005 

0.003 

12 

10 

2.918E-05 

22741 

81.43 

4222 

88 

4133 

369.968 

18.958 

0.00544 

7 

0 

0 

0 

11 

6.576E-06 

9530 

51.29 

4642 

180 

4461 

201.156 

14.651 

0.00238 

7 

2.485 

0 

4 

12 

8.237E-06 

1801 

64.02 

648 

73 

574 

21.416 

1.587 

0.00059 

9 

0 

0 

0 


in problem 4 in Table 0 because the problem HS78 is actually unbounded below and its optimal value in Table 
|3] is a local minimum. From Table 0] and [5] we see that the proportion of function evaluations in backtrack 
process can be very high. Problem 10, which has the biggest number of function evaluations also yields the 
highest possibility to backtrack. For all the problems, solving LP subproblems took a small amount of time, 
objective function. This special case appeared in all the tested Fuse errier Polynomials and Active Faces. We 
ran LPBNC on these problems with all choices of dimension n S {2,10,10^, 10^, 10"^, 10®, 10®}. The starting 
point is = [1,1,...,!], initial trust region radius Aq = 1. All theses instances terminated with stopping 
criterion reached, fvai = 0, and nf = 3. 
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7 Concluding remarks 

We present a version of bundle method with the unusual feature of using only an LP solver as its algorithmic 
engine. We study the properties of the linear model and expressed its model reduction. The optimal solution 
of our linear subproblem is not unique in contrast to the case of quadratic subproblem. However, no significant 
information is lost in order to ensure convergence of the algorithm. We use a local convexificaton with the 
deletion of some cutting planes at the end of a major iteration. Preliminary numerical experiments show that 
the algorithm is reliable and efficient for solving convex problems. For functions that are locally Lipschitz 
continuous and prox-regular we show that upon successful convexification the algorithm can converge to a fixed 
point of the proximal mapping. Numerical results of nonconvex problems suggest that with insignificant time 
spent on solving LP subproblems, a big portion of function evaluations can be consumed in the backtrack 
process. Improvements such as incorporation of a line search can be further studied in the future in order to 
increase efficiency to enable the solution of large-scale problems. 
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A Appendix 

A.l Proof for Theorem [2] 

In order to prove Theorem[^we need the following preliminaries. Let X denote a space with jj • j] its norm. For 

every A C A, a > 0 and x* G X* we define 

SL{x*, A,a) = {x € A \ {x, x*) > S'(A, x*) — a} , 
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where S{A,x*) := sup{(a:*,x) | x S A}. We will denote a linear function on X by x* or {x*, ■) and their action 
an element x by x* (x) or {x*, x). Denote the set of all strongly exposed points of a set C by exp C and the set 
of all extremal points by ext C. We have for any closed, proper convex function h 

epi h = co" (ext epi h) + R+ = co" (exp epi h) + R+. 

Suppose {x,h{x)) € ext epi h then h(x) = 5(x).In this section we use the following definition of proximal 
mapping P\ (/) (x) := argmin {/(•) + ^\\x — -IP} and r(/) is the associated prox-threshold. It is known that 
if / : R” —7> R+oo is quadratically minorized by a — r|| • |p and x G dom/ with 0 < A < f(/, x) then for each 
/3 > / (x) we have 35 > 0 such that 

- ^^ ^^(T-2(5+%r^ ■■=Mxyye Pxif)iz) V z G ^^(x). (78) 

Lemma 10. Suppose h : D ^ H is an lower bounded, real-valued lower semi-continuous convex function 
defined on a bounded closed convex set D C R", with interior. Then there exists a (j/*,—1) strongly exposing 
epi h at {y, h{y)) if and only if h — y* achieves a strict minimum on D at y. That is, argmin [h — y*] = {?/} or 
Q & d[h — y*] (y) has a unique solution y. 

Proof. Indeed ii h — y* achieves a strict minimum /3 := mmyg^ {hly) — {y*, y)} then we must have the lower 
level sets 

HP) ■={y&D\p> hljj) - {y*,y)} 

satisfying n^<^L(/3) = {y}- Otherwise there would exists ym G L(l3m) for Pm i P converging to y ^ y with 
y G C\p^pL{P) and by the lower continuity of h we have 

Yim.uA Pm = P> liminf {h{ym) - {y* .yH) > Kv) “ {y* ,y) 

m m 

implying y G argmin [h — y*). This contradicts the uniqueness of the minimizer. Thus the slices 
HP-^) = {y&D \ h{y) -{y*,y)+6> h{y) - {y*,y)} {y} 

as 5 I 0. Observe that ^(epi h, {—y*, 1)) = {{y, h{y)), {—y*, 1)) and so 

L{P + S) = {y e D \ {{y, h{y)), {-y*, 1)) > S'(epi h, {-y*, 1)) - 5} . 

Finally note that on epi h we have 

SL{{-y*,l),epi h,S) = {(y,a)Gepi h \ {{y,a), {-y*,1)) > S{epi h, {-y*,1)) - 6} 

(79) 

and diam5'L((—y*, l),epi h,S)< diamL(,S + 5) x 5 —>■ 0 as 5 | 0. 

If (y*,—1) strongly exposing epi h at {y,h{y)) then (1751) defines a slice whose diameter tends to zero. It 
then holds that the projection of this onto D also has a diameter tending to zero from which it follows that 
diamL(,3 + 5) —>■ 0 as 5 0. Clearly we have then n^<^L(/3) = {y} and so h — y* achieves a strict mimimum P 

at y. □ 

Lemma 11. Suppose h : D ^ R is an lower bounded, real-valued lower semi-continuous, convex function 
defined on a bounded closed convex set D C R”, with interior. Suppose (x,h{x)) is an extremal point of epi h 
and suppose {{xm, o:m)}m=o — ^ ^ Ofm = YH=o h{x). Then 

either A™ —>■ 0 or x™ —>■ x and hence for some i we have x™ —>■ x. 

Proof. We now claim that either A™ —?> 0 or x™ —>■ x for otherwise by a compactness argument (on [0,1] x D) 

we could extract a convergent pair of sub-sequences such that after renumbering we would have A™ —?> A ^ 0 

and x™ —>■ x' ^ X (note that A G [0,1] and x' G D). Then as A™x™ —>■ Ax' converges as does ^ 

we have convergence of Xj/i 2;™ to x" G P via 


+ (1 - AD 


E 


1-A” 


Ax' -I- (1 — A) x" = X. 
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It is not possible for A = 1 because this implies x’ = x. Next note that 


/ yrn \ 

KKxT) + (1 - AD E Hxf) ^ h{x) (80) 

and by lower semi-continuity of h we have liminfm h{x'^) > h{x'). If we take a subsequence along which the 
limit infimum liminfm is achieved then (IMl) implies -|limmfc ^^ converges along 

this subsequence as well. Let 

( 1 j!\^k ] = {x'\a”) e epi h, 

j/i ' * '' 

with h{x") < a" by the definition of epi h. Then from (IMl) and the lower semi-continuity of h again it follows 
that 

\h{x) + (1 — A) h{x") < h{x) 

Thus there exists {x',a'), {x”,a") G epi h such that X{x',a') + (1 — A) {x",a") = {x,h{x)) with 0 < A < 1 
contradicting the assumption that (cc, h(x)) is an extremal point of epi h. 

As either A™ —>■ 0 or a:™ —>■ a: it follows that for some i we have x^ ^ x since X]r=o ^ precludes all 

A™ from tending to zero. □ 

Proposition 2. Suppose h : D ^ H is an lower bounded, real-valued lower semi-continuous, convex function 
defined on a bounded closed convex set D C R", with interior. In addition suppose the strongly exposed points 
on h are dense on the boundary of epi h\c where C Q D and C is closed with intC ^ 0. Define g via 
epi g := exp epi h\c then 

h{x) = g (cc) for x G C. 

Proof. On C take a (a;, h (x)) G exp epi h and any supporting hyperplane generated by (—j/*, 1). By definition 
of exposedness we have 


epi hfi{{y,a) \ {{-y*, 1), {x, h{x))) > {{-y*, 1), {y, a))} = {{x,h (a;))} . 


Now 


epi h|c = CO expepi/i|c = n(2;,,j(a;))gexpepi/i{(2/, a) I ((-2/*,l),(a;D(a;))) < ((-y*,l),(y,a))}n [C X R] 

xGC 

C epi h. 

Now suppose there exists {y, a) G co expepi/i|c O (epi gY. Using the density of exp epi h\c we have 

9^y) = , h{x). 

{ic,n(rc))Gexpepi n 
xGC, x—^y 


As (x, h (x)) G exp epi h C epi/i with x G C and [y, a) G co exp epi h\c implies 
((-y*> 1): (x, h{x))) < {{-y*, 1), {y, a)) we have 


liminf {{-y*,l),ix,h{x))) = {{-y*,1), {y, g{y))) < {{-y*,l),{y,a)). 

j^expepi n 
x^C, x—^y 

Thus g{y) < a or (y,a) G epi g, a contradiction. Hence co exp epi h\c (~l [C x R] C epi g giving the equality 
epi h\c = CO exp epi h\c D [C x R] = epi g. □ 

Corollary 1. Suppose 5 : D —>■ R is an lower bounded, real-valued continuous function defined on a bounded 
closed set D C R", with interior. If the strongly exposed points of epi g are dense in the boundary of epi g 
then g (x) = co g (x) for all a; G lA and hence g is the restriction to ZA of a convex function. In particular this 
is true when 0 G d[g — y*] (y) has a unique solution for all y* G Bi (0) such that {—y*, I) supports epi g. 
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Proof. Let h := co epi g = epi co g which has a convex domain co D with intco D D intZ? 7 ^ 0. Clearly a 
strongly exposed point of h must be a strongly exposed point of g. We claim the converse is true. Let (x,g (x)) 
be strongly exposed in epi g. Suppose {x,h(x)) is not strongly exposed in epi h then there exists ) 

supporting epi h at {x,h{x)) such that for some <5 > 0 and all e > 0 there exists {y‘^,h{y‘^)) S epi h with 
\\{y"',Ky'')) - (a;,/i(a;))|| > d > 0 and 

{{-y*, 1 ), {x, h{x))) +e> {{-y*, 1 ), {y, h {y^))). 

As g{x)>h (x) and h (y^) = X;r=o ^*5 (xl) for some > 0 and y^ = Xr=o have 

n 

{{-y* A),{x,g{x))) + e>^\l{{-y* ,l),{xl,g{xl))) 

n 

or ^Af((-?/*,l),(a;, 5 (a;)) - (a;f,g«))) > -e. 

i=0 

As (—y*, 1 ) supporting epi h we have {{—y*, 1), {x,g{x)) — {xf,g (a;?))) < 0 for all i. Thus for all i with Af > 0 
we have 


K{i-y*A)Ax,gix)) - {xl,g{xi))) 

n 

^ '^>H{{-y* ^^)Ax, g{x)) - {xl,g{xi))) > -e. 

i =0 

We have {{—y*, 1), {x,g{x)) — {xf,g{x^))) > —e/Af for all e > 0 and i. Taking a convergent subsequence for 
e 4 , 0 we may consider Af —>■ A^ > 0 with xf ^ x and note that ((—y*, 1), {x,g{x)) — {xi,g (xt))} > 0 implies 
Xi = X as (x,g (x)) is strongly exposed in epi g. As we only require to consider convex combination of length 
n + 1 (Caratheodory’s theorem) we have (using x^ G D a bounded set and Af —?> 0 if a;f 74 x) we obtain 

n n / ^ \ 

y"" = XI ^ X ^ (X ^ 

2—0 2—0 \2 = 0 / 

contradicting ||(y®, h(y^)) — (x, h (a;))|| > (5 > 0 for all e > 0. 

Hence the strongly exposed points of h are dense in epi h. Now the extremal points of h are contained in 
the convex closure of the strongly exposed points i.e. co“ (ext epi h) = ccr(expepi g). Using Lemma ITT] the 
extremal points are actually limits of strongly exposed points. As strongly exposed points are also extremal 
points we find that the extremal points of h are dense in epi h. For all {x, h{x)) G ext epi h = ext co (y) we 
have h{x) = g (x). Thus h and g coinci de on the de nse set of exposed points of epi h (and epi y). Now apply 
Proposition [2] to deduce that epi h\D ■= exp epi gin- That is for y S U we have 


h{y) = ^ liminf g (x) > g (y). 

( 2 ;,/ 2 (ai))Gexpepi h 
x^D, x—¥y 

But by construction y (y) > h (y) so g{y) = h (y) = co y (y) for y G D. Apply Lemma [TO] for the last 
observation. □ 


Let y > 0 be the threshold value for which Proposition |T] holds. Take g > fj. We wish to the Minty 
parametrization of a maximal monotone operator. Consider y* (a;) = {x, z) and the problem of minimizing 
y (y) — (y, z). The optimality conditions are 


or 


0 G 

y e 


df (y) + g{y-x°) -z 
[df + gl] (y) - {gx° + z) 


[df + gl] My 


1 

V . 


(81) 


As z is arbitrary we may choose x^ + ^Zx 
become 


= X G levjjo/ OT Zx = g {x — x°). 
yG [df + gl]~^ {gx). 


The optimality conditions then 
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On the other hand consider the problem y G P\ (x) which has optimality conditions 


or 


0 e 
y e 


df (y) + jiy 

r 1 1 

dfPji 



( 82 ) 


We may then see the correspondence under the identihcation V = We may rewrite (1821) as 

y G [dg+{y-f])I]~^{y{x-x°)) (83) 

= [dg + {y - y) i]~^ {z^) 

By (I7R)) there we may take rj larger if needed to ensure for all x G (x) and any x G leVa;o/ we have 
y G Pi (x) contained in (x) O Bs„ (x) where g (x) := f (x) + ^ lla; — is convex on Bg (x) D B^ (xi). 

77 V V z M M ^ 

Assuming lev^jo/ is bounded we may extract a finite sub-cover of the open cover 

{Bg^ {x)\x G hv^of} . 

Let 

{Bg. (x^) \ i = l,...,k} 

be this finite cover. Then for any x G lev^^o/ we have x G Bg. (xi) and hence y G Pi (x) is contained in some 
Bei {xi) O Bg. [xi] on which g is convex. Then (1551) applies so via Minty’s resolvant theorem 

yGT{g[x- x°)) := [dg + {y - y) I]~'^ [g [x - x°)) (84) 

where T {x) := [dg + {y — y) I] ^ (g (a; — a;°)) is a single valued maximal monotone and nonexpansive when dg 
is maximal monotone, see |221 Theorem 12.15]. 

Theorem 6. Suppose / is prox-regular and locally Lipschitz on a bounded level set lev^of with int lev^of ^ 0. 
Let g{y;x,a) be defined in (1171) with a > 0. There exists an and a globally convex function H{y;x,a) 
satisfying g{y,x,a) > H{y;x,a) for all y G R", a > and x G lev^of, such that g{y;x,a) is the restriction to 
leva^of of H{y\x, a). 

Proof. We wish to apply Corollary [T] We take D = lev^of and g :£>—?> R is Lipschitz and prox-regular on the 
bounded domain D. We need to show that 0 G 9 [g — (z, •)] (y) has a unique solution for all y* G i?i (0) such that 
{—z, 1) supports epi g at some x G D. To this end take x G D and let Zx = y {x — a:°) so that g {y) — {y, z) attains 
its minimum at y when (1841) holds. But by construction we have x G Bg^ {xi) and hence y G Pi (x) is contained 

V 

in some B^^ [xi) A Bg. {xi) on which g is convex. Hence the operator T (x) := [dg + {g — g) I]~ {y {x — a;°)) 
will have Bg. (xi) in its domain and Bg. (xi) in its range, all contained in a region on which dg is locally a 
maximal monotone operator (as g is locally convex). Thus y = T (x) is unique by [221 Theorem 12.15]. Thus 
{x,g{x)) is exposed by {—Zx, 1) and Corollary [1] applies. □ 
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